As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.
translated by 谷歌翻译
In recent years, advances in deep learning have resulted in a plethora of successes in the use of reinforcement learning (RL) to solve complex sequential decision tasks with high-dimensional inputs. However, existing systems lack the necessary mechanisms to provide humans with a holistic view of their competence, presenting an impediment to their adoption, particularly in critical applications where the decisions an agent makes can have significant consequences. Yet, existing RL-based systems are essentially competency-unaware in that they lack the necessary interpretation mechanisms to allow human operators to have an insightful, holistic view of their competency. In this paper, we extend a recently-proposed framework for explainable RL that is based on analyses of "interestingness." Our new framework provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms. We also propose novel mechanisms for assessing RL agents' competencies that: 1) identify agent behavior patterns and competency-controlling conditions by clustering agent behavior traces solely using interestingness data; and 2) identify the task elements mostly responsible for an agent's behavior, as measured through interestingness, by performing global and local analyses using SHAP values. Overall, our tools provide insights about RL agent competence, both their capabilities and limitations, enabling users to make more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.
translated by 谷歌翻译
近年来,在可解释的AI中取得了重大进展,因为了解深度学习模型的需求已成为人们对AI的信任和道德规范的越来越重要的重要性。顺序决策任务的可理解模型是一个特殊的挑战,因为它们不仅需要了解个人预测,而且需要了解与环境动态相互作用的一系列预测。我们提出了一个框架,用于学习顺序决策任务的可理解模型,在该模型中,使用时间逻辑公式对代理策略进行表征。给定一组试剂痕迹,我们首先使用一种捕获频繁的动作模式的新型嵌入方法聚集痕迹。然后,我们搜索逻辑公式,以解释不同簇中的代理策略。我们使用手工制作的专家政策和受过训练的强化学习代理商的痕迹评估了《星际争霸II》(SC2)中战斗场景的框架。我们为SC2环境实现了一个功能提取器,该功能提取器将痕迹作为高级特征的序列,描述了环境状态和代理重播中代理的本地行为。我们进一步设计了一个可视化工具,描述了环境中单元的运动,这有助于了解不同的任务条件如何导致每个跟踪群集中不同的代理行为模式。实验结果表明,我们的框架能够将试剂痕迹分离为不同的行为群体,我们的战略推理方法会产生一致,有意义且易于理解的策略描述。
translated by 谷歌翻译
应对深层终身强化学习(LRL)挑战的一种方法是仔细管理代理商的学习经验,以学习(不忘记)并建立内部元模型(任务,环境,代理商和世界)。生成重播(GR)是一种以生物学启发的重播机制,可以通过从内部生成模型中绘制的自标记示例来增强学习经验,该模型随着时间的推移而更新。在本文中,我们提出了一个满足两个Desiderata的GR版本:(a)使用深RL学习的策略的潜在策略的内省密度建模,以及(b)无模型的端到端学习。在这项工作中,我们研究了三个无模型GR的深度学习体系结构。我们在三种不同的情况下评估了我们提出的算法,其中包括来自Starcraft2和Minigrid域的任务。我们报告了几个关键发现,显示了设计选择对定量指标的影响,包括转移学习,对看不见的任务的概括,任务更改后的快速适应,与任务专家相当的绩效以及最小化灾难性遗忘。我们观察到我们的GR可以防止从深层批评剂的潜在矢量空间中的特征映射中漂移。我们还显示了既定的终身学习指标的改进。我们发现,当与重播缓冲液和生成的重播缓冲液结合使用时,需要引入一个小的随机重放缓冲液,以显着提高训练的稳定性。总体而言,我们发现“隐藏的重播”(一种众所周知的班级入学分类体系结构)是最有前途的方法,它推动了LRL的GR中最新的方法。
translated by 谷歌翻译
我们提出了一种新颖的生成方法,用于根据表征剂的行为的结果变量来生成强化学习(RL)剂的看不见和合理的反事实示例。我们的方法使用变异自动编码器来训练潜在空间,该空间共同编码与代理商行为有关的观测和结果变量的信息。反事实是使用该潜在空间中的遍历生成的,通过梯度驱动的更新以及对从示例池中抽出的情况进行的潜在插值生成。其中包括提高生成示例的可能性的更新,从而提高了产生的反事实的合理性。从三个RL环境中的实验中,我们表明这些方法产生的反事实是与纯粹的结果驱动或基于病例的基准相比,它们更合理且与其查询更接近。最后,我们表明,经过联合训练的潜在训练,可以重建输入观察结果和行为结果变量,从而在训练有素的潜在现象中产生更高质量的反事实,仅重建了观察输入。
translated by 谷歌翻译
在将任务委派给自治系统之前,人类操作员可能需要保证对系统的行为。本文扩展了对功能数据的共形预测的先前工作,并扩展了整数分数回归,以提供对马尔可夫决策过程(MDP)执行固定控制策略的自主系统的未来行为的共形预测间隔。预测间隔是通过将共校正校正应用于分位数回归计算的预测间隔来构建的。结果间隔保证,使用概率$ 1- \ delta $,观察到的轨迹将位于预测间隔内,其中计算概率相对于起始状态分布和MDP的随机性。该方法在MDP上进行了用于入侵物种管理和Starcraft2战斗的方法。
translated by 谷歌翻译
Missing values are a common problem in data science and machine learning. Removing instances with missing values can adversely affect the quality of further data analysis. This is exacerbated when there are relatively many more features than instances, and thus the proportion of affected instances is high. Such a scenario is common in many important domains, for example, single nucleotide polymorphism (SNP) datasets provide a large number of features over a genome for a relatively small number of individuals. To preserve as much information as possible prior to modeling, a rigorous imputation scheme is acutely needed. While Denoising Autoencoders is a state-of-the-art method for imputation in high-dimensional data, they still require enough complete cases to be trained on which is often not available in real-world problems. In this paper, we consider missing value imputation as a multi-label classification problem and propose Chains of Autoreplicative Random Forests. Using multi-label Random Forests instead of neural networks works well for low-sampled data as there are fewer parameters to optimize. Experiments on several SNP datasets show that our algorithm effectively imputes missing values based only on information from the dataset and exhibits better performance than standard algorithms that do not require any additional information. In this paper, the algorithm is implemented specifically for SNP data, but it can easily be adapted for other cases of missing value imputation.
translated by 谷歌翻译
The literature on machine learning in the context of data streams is vast and growing. However, many of the defining assumptions regarding data-stream learning tasks are too strong to hold in practice, or are even contradictory such that they cannot be met in the contexts of supervised learning. Algorithms are chosen and designed based on criteria which are often not clearly stated, for problem settings not clearly defined, tested in unrealistic settings, and/or in isolation from related approaches in the wider literature. This puts into question the potential for real-world impact of many approaches conceived in such contexts, and risks propagating a misguided research focus. We propose to tackle these issues by reformulating the fundamental definitions and settings of supervised data-stream learning with regard to contemporary considerations of concept drift and temporal dependence; and we take a fresh look at what constitutes a supervised data-stream learning task, and a reconsideration of algorithms that may be applied to tackle such tasks. Through and in reflection of this formulation and overview, helped by an informal survey of industrial players dealing with real-world data streams, we provide recommendations. Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach, or any particular learning regime; and any constraints on memory and time are not specific to streaming. Meanwhile, there exist established techniques for dealing with temporal dependence and concept drift, in other areas of the literature. For the data streams community, we thus encourage a shift in research focus, from dealing with often-artificial constraints and assumptions on the learning mode, to issues such as robustness, privacy, and interpretability which are increasingly relevant to learning in data streams in academic and industrial settings.
translated by 谷歌翻译
Variational autoencoders and Helmholtz machines use a recognition network (encoder) to approximate the posterior distribution of a generative model (decoder). In this paper we study the necessary and sufficient properties of a recognition network so that it can model the true posterior distribution exactly. These results are derived in the general context of probabilistic graphical modelling / Bayesian networks, for which the network represents a set of conditional independence statements. We derive both global conditions, in terms of d-separation, and local conditions for the recognition network to have the desired qualities. It turns out that for the local conditions the property perfectness (for every node, all parents are joined) plays an important role.
translated by 谷歌翻译
Scholarly text is often laden with jargon, or specialized language that divides disciplines. We extend past work that characterizes science at the level of word types, by using BERT-based word sense induction to find additional words that are widespread but overloaded with different uses across fields. We define scholarly jargon as discipline-specific word types and senses, and estimate its prevalence across hundreds of fields using interpretable, information-theoretic metrics. We demonstrate the utility of our approach for science of science and computational sociolinguistics by highlighting two key social implications. First, we measure audience design, and find that most fields reduce jargon when publishing in general-purpose journals, but some do so more than others. Second, though jargon has varying correlation with articles' citation rates within fields, it nearly always impedes interdisciplinary impact. Broadly, our measurements can inform ways in which language could be revised to serve as a bridge rather than a barrier in science.
translated by 谷歌翻译